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ABSTRACT 



Recent studies have emphasized the significant implications 



of multimedia for the learning process. Even those researchers who take a 
less enthusiastic stand do not hesitate to assent that when blended with 
suitable pedagogical techniques and a proper design, the combined use of 
multiple media for the viewing and study of educational material can enhance 
the quality of the learning environment. This paper presents research and 
development efforts to design a pedagogically sound multimedia Web-based 
learning interface. Taken into consideration are recent findings from the 
area of cognitive psychology regarding the use of text, animation, and voice. 
The logic is based mainly on the presentation modality effect and the 
principles of spatial and temporal contiguity. The state of the learner's 
prior knowledge is considered to be of critical importance. The paper 
presents a flexible user interface supporting the use of animation and voice 
as default modalities and the existence of text in the same interface as an 
essential part of the learning process. The SMIL language is used for the 
implementation of the Web-based interface. (Contains 21 references.) 
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Abstract: In this article we present our research and development efforts to design a 
pedagogically sound multimedia web-based learning interface. We take into 
consideration recent findings from the area of cognitive psychology regarding the use 
of text, animation and voice. The logic is based mainly on the presentation modality 
effect and the principles of spatial and temporal contiguity. We consider the state of 
the learner’s prior knowledge to be of critical importance. We present a flexible user 
interface supporting the use of animation and voice as default modalities and the 
existence of text in the same interface as an essential part of the learning process. The 
SMIL language is used for the implementation of the Web-based interface. 
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Introduction 

Q 

W Recent studies have emphasized the significant implications of multimedia for the learning process. Even 

those researchers who take a less enthusiastic stand do not hesitate to assent that when blended with 
suitable pedagogical techniques and a proper design, the combined use of multiple media for the viewing 
and study of educational material can enhance the quality of the learning environment. In this article we 
discuss the theoretical background and the development efforts which have led us to create a flexible 
interface design that is used for a series of graduate and post-graduate courses. 

We distinguish two stages in the learning process: 

• the presentation of the instructional material, for which we can assume learners have no prior 



knowledge and 

• that material’s study and critical investigation 

We seek to provide an integrated support for both stages via the same user interface which will combine 
the use of animation, voice, and text labels on the one hand for the presentation of the course material to 
the learners; and alternatively, the use of hypertext and pictures for its in-depth scrutiny. We analyze the 
reasons that have led us to the adoption of this particular interface design based on the findings of 
cognitive psychology studies. Commencing with the notion of working memory, we pursue ways to 
optimize its limited resources, in order to use them as guidelines for our design decisions. Understanding 
working memory limitations and sensory modality benefits is essential in order to create courses that may 
reduce cognitive load and enhance the learning process. To this extent, cognitive psychology theories such 
as Paivio’s dual coding theory (1971, 1986, 1991) and Baddeley’s model of working memory (1986, 1992) 
as well as recent evidence provided by cognitive load psychologists can serve as the basis for the 
instructional design of multimedia courses that will facilitate learning. 

The flexibility of the user interface is a crucial matter. Learning outcomes can be increased by creating an 
interface that provides a combination of modes and modalities from which to select, according to the prior 
knowledge of the learner. 

The Internet has been chosen as the implementation environment due to the additional capabilities it has 
to offer to the learner and SMIL has been selected because of its ability to synchronize and coordinate 
diverse multimedia elements. 



Finding ways to overcome the limits of working memory 

According to the traditional three-stores model of memory (Atkinson and Shifffin, 1968), a way to 
conceptualize memory is the following: a) one part of memory, known as sensory memory , is capable of 
storing limited amounts of information for very brief periods of time b) a second component, short-term 
memory, is capable of storing information for somewhat longer periods of time but is also of relatively 
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limited capacity and c) a final constituent, called long-term memory , is of very large capacity and capable 
of storing information for very long periods of time, perhaps even indefinitely. 

Working memory is defined as being the part of long-term memory, which also comprises all the 
knowledge that has been recently activated in memory including the short-term memory. This implies that 
memory comprises three concentric circles, the inner one corresponding to short-term memory, the 
intermediate circle to working memory and the exterior one to long-term memory. Information resides 
within long-term memory, and, when activated, moves into long-term memory’s specialized working 
memory, which will actively move information into and out of the short-term memory contained within it 
(Sternberg, 1996). 

Many memory theorists have assumed that working memory comprises multiple memory systems, which 
most frequently are associated with auditory or visual processing. For example, one of the most acceptable 
models of working memory proposed by Alan Baddeley (1990) consists of at least the following: a) a 
visuospatial sketchpad, which briefly holds and deals with visual images and b) an articulatory 
(phonological) loop, which briefly holds inner speech for verbal comprehension and for processing verbal 
information. It is believed that the first two systems process their different types of information in a largely 
independent and parallel fashion. 

In relation to the role and characteristics of working memory, our basic suppositions are the following: a) 
working memory has a limited capacity and duration, that is, we are able to hold and process only a few 
items of information at a time, b) working memory includes an auditory working memory and a visual 
working memory, according to Baddeley’ s theory, c) each system operates in parallel d) meaningful 
learning occurs when a learner retains relevant information in each system and is able to make referential 
connections between them (Mayer & Anderson, 1991). 

Prior knowledge, Schema acquisition and Chunking 

An average person can retain up to a few chunks of information at a time in working memory. Working 
memory is believed to be capable of storing seven plus or minus two chunks of information at a time 
(Miller, 1956). In other words, we can think about only five to nine distinct items at any given time. A 
general way to overcome the problem of limited capacity of working memory is by creating a schema. A 
schema is identified as a cognitive construct that allows us to treat multiple elements of information as a 
single element classified according to the manner in which it will be used (Bagui, 1998). Therefore, a 
schema puts less pressure on working memory, facilitating understanding. Schema acquisition is 
facilitated by the existence of prior knowledge. Learners who know a great deal about a subject have more 
well-developed schemata for incorporating new knowledge. 

Dual coding and the sensory modality effect 

With his dual-coding theory, Alan Paivio (Paivio, 1971, 1986, 1991; Clark & Paivio, 1991) suggested that 
information is processed through one of two generally independent channels, modes or codes. That is, our 
imaginal and verbal mental representations may be viewed as two different codes (analogue and symbolic), 
which organize information into knowledge. Learning is better when information is processed through two 
channels instead of one. 

Connections can be made only if corresponding nonverbal and verbal information is in working memory 
at the same time. Information processed through two channels is called referential processing and has an 
additive effect on recall (Mayer & Anderson, 1991; Paivio, 1967, 1991; Paivio & Csapo, 1973). This 
happens because the learner is able to create more cognitive paths that can be followed to retrieve the 
information. 

The research to date thus suggests that dual-mode input (verbal and nonverbal) helps people learn. But 
humans can also input information through various sensory modalities. Recent studies have addressed the 
problem of the combination of modes and modalities that should be preferred in order to promote 
meaningful learning. For example, should an animation (nonverbal mode, visual modality) be 
accompanied by an explanation presented as a narrative (verbal mode, auditory modality), an explanation 
presented as on-line text (verbal mode, visual modality), or by both of these forms of explanation 
simultaneously? 

Drawing from Baddeley’s (1992) theory of working memory and Sweller’s (1988, 1989; Chandler & 
Sweller, 1992; Sweller, Chandler, Tierney, & Cooper, 1990) cognitive load theory, several researchers 
have shown that working memory capacity can be enlarged by using dual-modality presentation 
techniques (Mousavi, Low, & Sweller, 1995; Mayer & Moreno, 1998, 1999, 2000). 
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This evidence supports the view that instructional designers should not only be concerned about 
combining verbal and nonverbal information consistent with Paivio s dual coding theory, it is essential 
that they take into consideration the role of sensory modalities when designing multimedia presentations 
with pictures and words. Mixed modality presentations are superior to the most integrated text and visual 
presentations (Moreno & Mayer, 1999). 

The contiguity principle 

The contiguity principle was proposed by Mayer & Anderson (1992) as a way of increasing the 
effectiveness of multimedia instruction when words and pictures are presented contiguously in time or 
space. 

They proposed the use of the term spatial-contiguity effect to refer to learning enhancement that results 
when text and pictures are physically integrated or close to each other rather than when they are physically 
separated. One interpretation of this result is that students might be missing part of the visual information 
while they are reading the on-screen text or, vice versa, missing portions of the text while focusing on the 
pictures. This forces the learner to search for relations between them. The cognitive load associated with 
this search is extraneous (Mousavi, Low, & Sweller, 1995). 

Similarly, the term temporal-contiguity effect has been proposed for cases when visual and spoken 
materials are temporally synchronized resulting again in the enhancement of learning. In these cases, 
learners are able to hold a visual representation in visual working memory and a corresponding verbal 
representation in verbal working memory at the same time, allowing them to build referential connections 
between them, consistent with the dual coding theory. Therefore, differences in synchronicity between 
verbal and nonverbal materials that need to be integrated in a lesson also affect learning. 

Evidence associated with the spatial contiguity effect is provided by several studies (Mayer, 1989a, 1989b; 
Mayer and Gallini, 1990). . 

Therefore, another critical aspect of designing instructional software, is to physically integrate the 
corresponding pictorial and verbal information in a multimedia lesson as much as possible, both spatially 
as well as temporally. 

Two alternative combinations for presenting information based on the learner’s 
prior knowledge 

The previous discussion has led us to adopt two different and complementary approaches for presenting 
instructional material. We take into consideration two states of the learner’s knowledge domain: a) no 
prior knowledge and b) prior knowledge. 

There appears to be an enhancement in learning with multimedia, especially for learners with low prior 
knowledge for whom the rich multimedia environment may also positively influence motivation and 
engagement. Mayer (1993) believes multimedia information is more effective for learners with low prior 
knowledge or aptitude in the domain being learned because it helps them build a cognitive model or to 
connect the new knowledge to prior knowledge. 

On the contrary, learners with high domain knowledge have a rich source of prior knowledge that can be 
connected to new knowledge. In other words, they can make referential connections or build cognitive 
models with text alone. Furthermore, it is also believed that for learners with prior knowledge in the 
domain being learned, textual information may force them to expend more effort to read and understand 
the information resulting in improved long-term encoding of the information (Najjar, 1996). 

In the following sections we discuss the characteristics and advantages of each of the two combinations. 

Hypertext and pictures 

Reading text is a complex process. Learners expend more effort when reading, in order to understand 
information. This results in improved long-term encoding of the information. 

In many cases, the use of text alone is not sufficient to help the reader understand concepts and ideas. It 
has been proven that by incorporating pictures within text, the learning process is enhanced. Visual 
illustrations make abstract ideas and complex information more concrete and easier to comprehend. 
Pictures seem to allow very rich cognitive encoding that allows surprisingly high recognition rates (Najjar, 
1996). 
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The combination of text and pictures is effective because it allows for the concurrent coding of verbal and 
non-verbal information. There has been evidence to show that adding voice to textual presentation 
degrades learning. 

Animation, voice and text labels 

Voice is a more realistic and natural mode of presenting information than displayed text, because of the 
perception of the person behind the voice and the verbal loop (acoustic) we use to store information. Voice 
does not distract visual attention from stimuli such as diagrams, and is therefore more engaging. It is good 
for conveying temporal information (Shih & Alessi, 1996). 

Despite the potential advantages discussed above, voice suffers from a) the problem of being ephemeral. 
Text remains in front of the eyes of the reader for a longer period of time. This makes it more suitable than 
voice when studying and critically analyzing the educational material. Voice lasts only for an instant in 
time and is more difficult for a learner to control and b) the problem of being difficult to search through 
for a specific piece of information, something with does not occur in the case of text. The above 
characteristics point to the view that text is more appropriate for the in-depth investigation of the 
educational material. 

Animation is a motivational way to present information visually. Studies (Najjar, 1996) have shown that 
information presented in this form of animation appears to be more effective for learners without prior 
knowledge or aptitude in the domain being learned. In contrast, learners with high domain knowledge 
have a rich source of prior knowledge that can be connected to the new knowledge. In this case, we could 
exploit the advantages of combined text and picture presentation techniques. 

By using text labels, we facilitate the creation of a schema by giving a cue to learners to focus their 
attention on the most important points during a certain portion of a narration. The combination of 
animation and voice is effective because we use non-verbal and verbal information through two sensory 
modalities. 

The interface design 

The above research has important implications for the design and practice of multimedia instruction and 
has served as the basis for the particular user interface design we have adopted. On the one hand, an effort 
was made to be consistent with the findings of previous studies of working memory resource limits and 
cognitive load principles. On the other, we have tried to create a flexible interface that can accommodate 
diverse learning preferences. Flexibility refers to the learner’s ability to select a different presentation style 
from the one initially recommended. 




Figure 1 Figure 2 



Figure 1 shows the default user interface displayed to the learner upon commencement of a typical 
instructional unit. The screen has been divided into three frames, one of which (the frame containing the 
text and static pictures) is not initially visible to the user. In trying to benefit from the processes of schema 
acquisition and chunking, we have taken care not to present learners with too many different pieces of 
information or ideas at the same time. 

Each lesson has been divided into smaller chunks of related information called instructional units . The top 
frame contains the title of the lesson as well as links to associated chunks which comprise the particular 
lesson. The frame below the title contains the multimedia content including animation, explanatory text 
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labels as well as the accompanying auditory information. The explanatory text labels have been 
incorporated into the animation, in accordance with the principle of spatial contiguity, ameliorating the 
ephemeral characteristic associated with voice. This frame also includes buttons for controlling the flow of 
the presentation. For example, the learner can pause and restart the animation and voice, change the 
volume of the voice, etc. Links to related chunks of information are also provided through the use of hot 
spots (temporal or spatial links) on the diagram. This navigational feature is supported by the SMIL 
language (discussed below). 

The learner can reveal the hidden frame, as shown in Figure 2. This allows for simultaneous viewing of 




Figure 3 portrays an alternative presentation mode. After the learner has viewed the presentation once (the 
recommended but not obligatory viewing method), the animation frame may be completely closed so that 
the learner may study the material in the form of text together with accompanying static pictures. 

This process is believed to: 

• initially increase learner motivation and engagement during the first viewing stage when prior 
knowledge of the domain being learned is still nonexistent or at a considerable low level, 

• increase learning outcomes during this first stage through a combination of animation and voice and, 

• have a positive impact on understanding and retention of the presented material due to the use of text 
and static pictures in the second stage of its viewing, 

Smil and other implementation issues 



The implementation of the flexible user interface for the multimedia material is based mainly on the SMIL 
Language. Using SMIL, it is possible to define screen regions, associate media objects with the regions, 
and synchronize the appearance of media objects. 

The main reasons that we selected SMIL were: 

• it can integrate and co-ordinate many diverse types of multimedia information, synchronizing one or 
more animation files with voice and also text labels, 

• we are able to define spatial and temporal links 

• it can be considered as an “open”, platform-independent technology based on W3C XML. 

One of the fundamental problems encountered during the implementation of a multimedia, Web-based 
course is undoubtedly the limited bandwidth of the Internet. This is perhaps one of the main factors that 
justify the lack of extremely high-quality multimedia material. We use streaming technology to address 
this problem. 

The product selected was RealServer created by Real Networks. The specific streaming technology makes 
use of the RTSP (Real Time Streaming Protocol) instead of the HTTP protocol. Figure 4 shows the 
architecture of the system. There are two servers: a Web server provides conventional html pages while a 
streaming media server supplies rich multimedia content to the learner. 
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Conclusion 

Until recently on-line courses available to university students worldwide were in the form of text-based 
material with static pictures. 

Recent studies have provided evidence for the claim that multimedia has a positive effect on the learning 
process. However, we need to take into consideration many factors regarding the way learners receive, 
encode, store and process information. In addition, a central matter of concern seems to be the level of 
prior knowledge. Bearing this factor in mind, we have proposed a specific user interface design suitable 
for use through the Internet.^ 
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